The reweighting technique, where samples from one probability distribution are adjusted with a weight factor to compute averages from a different (but usually very similar) distribution, is biased for a finite number of samples.
The details can be found here.
Question for the audience: Is this known? I don't recall seeing this mentioned any time the method has been presented.
Some possible routes to fix the problem are listed in the paper, but alas, I haven't been able to turn any of them into a useful solution yet.
Subscribe to:
Post Comments (Atom)
8 comments:
Hi Mark,
Is it possible for readers to upload pdf files?
Thanks
Arjun R. Acharya
I don't think it is possible to easily post pdf files to blogger. Search google for "blogger upload pdf" for some methods. (The most straightforward method is to host it elsewhere and link to it.)
Hi Mark,
Some very brief thoughts :
http://aracharya.googlepages.com/scalingofreweightingbias.pdf
After having typed the above, I see that you mentioned the 1/x as being the source of bias, and also see that you mention calculating integrals F as expansion of powers of (1/N).
Arjun R. Acharya
Hi Arjun,
Thanks for posting that derivation for the 1/N dependence. The main question with using a fitting-and-extrapolation technique is whether the values for different N need to be computed independently, or whether a single run is sufficient.
Also, since my original post, I realized that the correlations between the numerator and denominator (from using the same weights) are also important in the bias. One way to correct this is to use a separate set of sample points for the numerator and the denominator. The combination of this and the correction for 1/D can produce a reduced bias method.
When comparing results, I suspect the mean-square error might be a better measure of estimator quality (for a given amount of data).
Hi Mark,
Yes I had noticed this correlation thing and in an earlier attempt I tried using (x, x' are independent sets of configs, derived from same simulation) E[f(x) g(x')] = E[f(x)g(x')], to find a g() such that g() was a better estimator of 1/E[w(x')] but was unsuccessful! (E=P - expectation)
This is definitely the first time I have seen anyone mention about bias in reweighting.
Just to note that what I was trying to do above is
clearly impossible. For example if one considers
a normally distributed variable x with variance s and mean m,
and we have an estimator which yields E[g(x)]=1/m, then
we have constraints on the moments of g as :
E[g(x)] = 1/m Eq 1
E[xg(x)] = s((1/s)+(1/m^2)) (obtained by differentiating Eq 1 w.r.t. m)
E[x^2 g(x)] = ....etc
So g(x) can be expressed as a power series (say about (1/m)) whose coefficients
depend on m and s, which a-priori we do not know. Furthermore these
coefficients will depend on the distribution, so are going down the
wrong path!
I was also thinking of using Jensens inequality to derive
estimators whose expectations bound the desired quantity.
e.g.we know that E[1/x] > 1/E[x] (and also thought about
geometric means vs arithmetic means). But again a little thought shows
that both of these are not much use for this particular case.
Regarding what I typed : I think if the sampled points used for the
numerator and denominator
are independent, then the (1/N) scaling is replaced by (1/N^2) scaling, since
the expectation of [sum w - N] vanishes, and we are left with [sum w - N]^2
term.
Since we are on the subject of expectations, I did come across this page with
a whole load of different types of means, which might be of interest :
http://www.algebra.com/algebra/homework/Average/Mean.wikipedia
Sorry the last link is :
http://www.algebra.com/algebra
/homework/Average/Mean.wikipedia
You might want to have a look at
Phys. Rev. E 73, 056706 (2006). We discuss this bias, but more in the context of simple multivariate distributions to examine the bias as a function of the dimensionality of the system.
Lee Warren
gwarren[@]udel.edu
Post a Comment